This form is a web page which was created in MS WORD and therefore can be easily edited that way

Leonard - EAKOS 2009

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Lorne Leonard, The Pennsylvania State University - Research Computing & Cyberinfrastructure, lorne_leonard@hotmail.com [PRIMARY contact]

Tool(s):

EAKOS is a collection of tools to demonstrate how one can interface with web based visualization and GIS services. The toolset is an early prototype developed by Lorne Leonard during his spare time at the 2008 Christmas break and weekends leading up to the competition deadline. Lorne works with researchers and faculty at The Pennsylvania State University and he uses the toolset to demonstrate potential visualization and analytical solutions to enhance their research goals.

Video:

Leonard_Vast2009_Challenge1.mov

ANSWERS:

MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent. Please name the file Traffic.txt and place it in the same directory as your index.htm file. Please see the format required in the Task Descriptions.

Traffic.txt

MC1.2: Characterize the patterns of behavior of suspicious computer use.

I approached this challenge by identifying who went against policy and entered the building, and more importantly the classified room, without badging in. Using the "Prox Card" dataset, I plotted the sequence of daily activities per employee ID based on the Prox card readers as demonstrated in Figure 1. To code this tool took approximately three days. Highlighted in orange are the IDs that "piggybacked" in/out of the classified room (for the entire day) and the event is linked with a yellow line as shown in Figure 1.

Figure 1: Prox Card data for ID 30 for the entire month duration.

Automatically highlighting the days when IDs broke protocol made it very easy and fast (about 5 minutes) to scroll through the 60 IDs and identify that IDs 30, 38 and 49 broke the protocol by piggybacking in/out of the classified room (Table 1). ID 30 was the worst offender with three piggyback events into the classified room. What was this person doing during the period of entering/leaving the classified room to the next event?

ID	Day	Event	Event Start Time	Next Event
30	10	Did not prox in-classified	10:33 AM	5:05 PM
30	17	Did not prox in-classified	11:31 AM	2:03 PM
30	24	Did not prox in-classified	9:00 AM	10:52 AM
38	4	Did not prox out-classified	1:12 PM	2:15 PM
49	8	Did not prox out-classified	12:56 PM	2:06 PM
Table 1: IDs who piggybacked within the classified room.

Each of the three events for ID 30 occurred on a Thursday. Is this an arranged agreement with his handlers?

Figure 2: Prox Card data for IDs 38 and 49 for the entire month duration.

To investigate who entered the building only in the morning by piggybacking, I manually inspected which days lacked a grey marker at the start of the sequence. These results, shown in Table 2, took approximately 10 minutes to generate.

ID	Day(s)
0	17
7	2
13	8,23
27	24
37	24
38	3
39	24
48	23
49	8,22,31
50	30
51	2
54	16
55	16
58	31
59	31
Table 2: IDs who piggybacked in the morning.

Now that I have identified the IDs who are piggybacking into the classified room, I developed another tool to help identify ID connections, traffic amounts and event times. This took about four days to code, load the data and visualize. I visualized the "IP traffic" dataset in two ways. The first, (Figure 3) by plotting source IPs against destination IP responses and request sizes as a Treemap. This tool also includes the embassy plan showing assigned rooms per ID and a sequence plot against the embassy plan of entering the building and in/out of the classified room. By referring to this sequence diagram, and the results from the first tool above, I can identify when the piggybacking occurred. Furthermore, by using the mouse, I can scroll over the larger Treemap cells to identify when the IP traffic happened and manually determine if this event is around my time of interest. The second visualization method (Figure 4) plots response versus request payloads per ID within a specified date range. The purpose of this tool is to help identify if the traffic event identified in the first visualization method is an outlier or not. By manually doing mouse-over's with the outliers I can identify if these events happened near the piggyback event or not. This process took approximately 30 minutes to mouse over and collect the data, and the results are shown in Table 3.

Figure 3: Traffic sizes using a Treemap and ID sequence within embassy.

Figure 4: Plotting response versus request payloads.

Source IP	AccessTime	DestIP	Socket	ReqSize	RespSize	Count
37.170.100.30	2008-01-10T10:35:10.367	10.30.138.140	80	3817	569386	29
37.170.100.30	2008-01-10T14:29:11.316	101.160.27.28	80	4202	335984	12
37.170.100.30	2008-01-10T15:00:15.208	100.104.83.89	80	44979	1751371	2
37.170.100.30	2008-01-10T15:39:57.166	105.77.95.226	80	56278	606589	3
37.170.100.30	2008-01-10T16:09:56.584	105.211.108.147	80	4953	1597330	10
37.170.100.30	2008-01-10T16:12:03.871	103.143.114.91	80	53081	599157	4
37.170.100.30	2008-01-10T16:49:38.494	10.124.235.51	80	44364	893780	4
37.170.100.30	2008-01-10T16:56:40.458	103.120.93.59	80	16568	264422	4
37.170.100.30	2008-01-17T12:38:41.768	104.73.180.170	80	6000	404896	21
37.170.100.30	2008-01-17T12:38:51.905	37.105.202.184	80	65533	27996	5
37.170.100.30	2008-01-17T13:36:47.489	103.76.60.0	80	5076	10384	2
37.170.100.30	2008-01-17T13:36:53.933	10.30.138.140	80	5627	371604	29
37.170.100.30	2008-01-24T14:46:16.832	100.226.208.157	80	4879	156342	13
37.170.100.30	2008-01-24T14:46:25.842	10.228.35.56	80	55712	55412	1
37.170.100.30	2008-01-24T17:18:16.094	10.30.138.140	80	6789	5104487	29
37.170.100.38	2008-01-04T17:28:43.475	37.109.133.151	80	63665	1880595	2

37.170.100.49	2008-01-08T16:21:30.114	106.192.237.252	80	5502	1356516	3

Table 3: IDs who piggybacked and possible payload outliers

As previously mentioned, ID 30 is the worst offender with piggyback events occurring on days 10, 17 and 24. The count column in Table 3 indicates the number of times the source IP contacted the destination IP for one month. The most suspicious destination IP is 10.30.138.140 where ID 30 responded with large amounts of data on days 10, 17 and 24. Additionally, ID 30 contacts this person on a regular basis, with 29 times for the month and he/she appears to be masking the activity by visiting another destination IP at nearly the same time (highlighted in yellow).

There is no activity near the piggyback event for IDs 38 and 49. Perhaps they are leaving the building with the data. The majority of larger response payloads happen either before the piggyback event or sometime after another visit to the classified room. A possible scenario is that either ID left the building to meet his/her contacts to discuss the collected data and confirm the data meets his/her needs before transmitting the information later that same day. If true, the event may have been at 1/4/2008 5:28 PM to 37.109.133.151 for ID 38 and 2008-01-08T16:21:30.114 to 106.192.237.252 for ID 49. However, if you compare the payload events for the entire month for both IDs, the amount does not appear significant (Figure 5).


Figure 5A: Possible event for ID 38 compared with entire month of events. Blue circle denotes request size at 1/4/2008 5:28 PM.	Figure 5B: Possible event for ID 49 compared with entire month of events. Blue circle denotes request size at 1/8/2008 4:21 PM.

Leonard - EAKOS 2009

VAST 2009 Challenge Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Lorne Leonard, The Pennsylvania State University - Research Computing & Cyberinfrastructure, lorne_leonard@hotmail.com [PRIMARY contact]

Tool(s):

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic